First Steps in Pointfree Functional Dependency Theory 
(Draft of May 14, 2005) 


J.N. Oliveira 

Dep. Informatica, Universidade do Minho, Campus de Gualtar, 4700-320 Braga, Portugal, 

jno@di . uminho . pt 


Abstract. When software designers refer to the relational calculus , what they 
usually mean is the set-theoretic kernel of relational database design “a la Codd” 
and not the calculus of binary relations which was initiated by De Morgan in the 
1860s. 

Contrary to the intuition that a binary relation is just a particular case of n-ary 
relation, this paper shows the effectiveness of the former in “explaining” and 
reasoning about the latter. The theory of functional dependencies, which is central 
to such database design techniques, is addressed in a pointfree style instead of 
reasoning in the standard set-theoretic model “a la Codd”. 

It turns out that the theory becomes more general and considerably simpler. Ele- 
gant expressions replace lengthy formulas and easy-to-follow calculations replace 
pointwise proofs with lots of “■ ■ ■” notation, case analyses and natural language 
explanations for “obvious” steps. 


1 Introduction 

In standard relational data processing, objects are recorded by assigning values to their 
observable properties or attributes. A database file is a collection of attribute assign- 
ments, one per object. Displayed in a bi-dimensional tabular format, each object corre- 
sponds to a tuple of values, or row — eg. row 10 in some Excel spreadsheet — and each 
column lists the values of a particular attribute in all tuples (eg. row ”A” in the same 
spreadsheet). All values of a particular attribute (say i) are of the same type (say A,). 
For n such attributes, a relational database file R can be regarded as a set of n-tuples, 
that is, R C A\ x . . . x A n . A relational database is a collection of several such n-ary 
relations. 

When software designers refer to the relational calculus , by default what is under- 
stood is the calculus of n-ary relations studied in logics and database theory, and not 
the calculus of binary relations which was initiated by De Morgan in the 1860s [15] an 
eventually became the core of the algebra of programming [1, 4, 3]. 

According to [11], it was Quine, in his 1932 Ph.D. dissertation, who showed how 
to develop the theory of n-ary relations for all n simultaneously, by defining ordered n- 
tuples in terms of the ordered pair. (Norbert Wiener is apparently the first mathematician 
to publicly identify, in the 1910s, n-ary relations with subsets of n-tuples.) Since the 
1970s, the information system community is indebted to Codd for his pioneering work 
on the foundations of the relational data model theory [5], 
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Codd discovered and publicized procedures for constructing a set of simple n-ary 
relations which can support a set of given data and constructed an extension of the cal- 
culus of binary relations capable of handling most typical data retrieval problems. Since 
then, relational database theory has been thoroughly studied, and several textbooks are 
available on the topic, namely [12], [17] and [6], 

The common understanding is that binary relations are just n-ary relations, for 
n = 2, and so there seems to be little point in explaining n-ary relational theory in 
terms of binary relations. As a matter of fact, when Codd talks about the binary relation 
representation of an n-ary relation in [5], one has the feeling that there are more dis- 
advantages than advantages in such a representation. Contrary to these intuitions, this 
paper aims at showing that such a strategy makes sense, at least in functional depen- 
dency theory, the subset of database theory actually addressed in this paper. (Outside 
the database context, functional dependencies have been used to solve ambiguities in 
multiple parameter type classes in the Haskell type system [10].) 

Classical pointwise relational database theory is full of lengthy formulae, and proofs 
with lots of “. . .’’notation, case analyses and English explanations for ‘bbvious” steps. 
We show that the adoption of the (pointfree) binary relation calculus is beneficial in 
several respects. First, the fact that pointfree notation abstracts from ‘joints” or vari- 
ables makes the reasoning more compact and effective. Second, proofs are performed 
by easy-to-follow calculations. Third, one is able to generalize the original theory, as 
will happen with our generalization of attributes to arbitrary (suitably typed) functions 
in functional dependencies and multi-valued dependencies. 

Paper structure. This paper is structured as follows. First we introduce the standard 
notion of a functional dependency (FD) and revise the pointfree theory of functions. 
Both worlds are combined in section 5, where FDs are presented in the pointfree style. 
Sections 6 and 7 generalize (pointfree) FD-theory by moving from attribute projections 
to arbitrary functions. Sections 8 to 10 provide calculational proofs for the standard FD- 
theory, inc. the Armstrong-axioms and the theorem of lossless decomposition. Multi- 
valued dependencies are the subject of section 11. The remainder of the paper presents 
our conclusions and prospect for future work and (in the appendices) some auxiliary 
results. 


2 What is a functional dependency? 

In an n-ary relation, attribute names normally replace natural numbers in the identifi- 
cation of attributes. The enumeration of all attribute names in a database relation, for 
instance 


S = {Pilot, Flight, Date, Departs) (1) 

concerning an airline scheduling system *, is a finite set called the relation’s scheme. 
This scheme captures the syntax of the data. What about semantics ? 

1 This example is taken from [12]. 
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Even non-experts in airline scheduling will accept the following ‘business”rule: A 
single pilot is assigned to a given flight, on a given date. This restriction is an example 
of a so-called functional dependency (FD) among attributes, which can be stated more 
formally as follows: attribute Pilot is functionally dependent on Flight and Date. 
In the standard practice, this will be abbreviated by writing 

Flight Date -a Pilot 

which has the following, alternative reading: Flight and Date functionally determine 
Pilot. Another FD in this example is 

Flight -» Departs (2) 

since a given flight always departs at the same time. 

The addition of functional dependencies to a relational schema is comparable to the 
addition of axioms to an algebraic signature (eg. axioms such as pop(push(a, s)) = 
s adding semantics to the syntax of a stack datatype involving operators push and 
pop). How do we reason about such functional dependencies? Can we simplify a set of 
dependencies by removing the redundant ones, if any? How do we design a concrete 
database implementation from a relational schema and its dependencies? 

At the heart of relational database theory we find functional dependency (FD) the- 
ory, which is axiomatic in nature and stems from the definition of FD-satisfiability 
which follows. 

Definition 1. Given subsets x. y C S of the relation scheme S of a relation R, this 
relation is said to satisfy functional dependency i-}j iff all pairs of tuples t,t' € R 
which “agree” on x also “agree” on y, that is, 

(V t,t' : t,t' € R : t[x] = t'[x] t[y\ = t'[y \ ) (3) 

(Notation t[x\ meaning “the values in t of the attributes in x”, will be scrutinized in the 
sequel. ) □ 

The closure of a set of FDs is based on the so-called Armstrong axioms [12] which can 
be used as inference rules for FDs. Equivalent axioms have been found which make FD 
checking more efficient. 

Why has this theory “gone this way”? Perhaps one reason lays in the fact that for- 
mula (3), with its logical implication inside a ‘two-dimensionaTTmiversal quantifica- 
tion, is not particularly agile. Designs involving several FDs at the same time can be 
hard to reason about. 

This calls for a simplification of this very basis of FD-theory. The main purpose 
of this paper is to present an alternative, more general setting for FD-theory based on 
the pointfree binary relation calculus. It turns out that the theory becomes more general 
and considerably simpler, thanks to the calculus of simplicity and coreflexivity . (Details 
about this terminology will be presented shortly.) 

We will start by reviewing some basic principles. Note that the qualifier ‘functional” 
in ‘functional dependency” stems from ‘function”, of course. So our first effort goes 
into making sure we have a clear idea of ‘What a function is”. 
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3 What is a function? — the Leibniz view 

A function / is a special case of binary relation satisfying two main properties: 
- “Left” Uniqueness 


b / a A b' f a =4- b = b' (4) 

- Leibniz principle 

a = a'=^fa = fa' (5) 

It can be shown (see an exercise later on) that this is the same as saying that functions 

are simple and entire relations, respectively: 

- / is simple: 

img f C id (6) 

- / is entire: 

id C leer / (7) 


Formulae (6) and (7) are examples of pointfree notation in which points — eg. a, a 1 , b, b' 
in (4,5) — disappear. (For instance, instead of writing a = a', we identify the identity 
relation id which relates a and a' when they are the same.) In order to parse such 
compressed formulae we need to understand the meaning of expressions such as ker f 
(read: ‘ kernel of /’) and img f (read: ‘ image of /), 


ker R = R° ■ R (8) 

img R = R - R° (9) 


whose definitions involve two basic relational combinators: converse ( R° ) and com- 
position (R ■ S). The former converts a relation R into R° such that a(R°)b holds iff 
bRa holds. (We write bRa to mean that pair ( b , a) is in R.) The latter (composition) is 
defined in the usual way: b(R ■ S)c holds wherever there exist one or more mediating 

R R 

a £ A such that bRa A aSc , where B - A and C — — — B are two binary rela- 

tions on datatypes A (source) and B (target) and B (source) and C (target), respectively. 
Converse commutes with composition in a contravariant way, 

(R ■ S)° = S° ■ R° (10) 

and so image and converse commute via converse: 

ker(R°)=imgR (11) 

img ( R ° ) = ker R (12) 

As in (6) and (7), the underlying partial order on relations is written R C S, meaning 

R C 5 = {V 6, a : : bRa =f> bSa) (13) 
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for all a and b suitably typed. 

The simple and entire classes of relation mentioned above are part of a wider binary 
relation taxonomy. 



bijection (isomorphism) 

whose four top-level classification criteria are captured by the following table. 



Reflexive 

Coreflexive 

ker R 

entire R 

injective R 

img R 

surjective R 

simple R 


where R is said to be reflexive iff it is at least the identity (id C II) and it is said to be 
coreflexive (or a partial identity ) iff it is at most the identity, that is, if R C id holds. 

Coreflexive relations are fragments of the identity relation which can be used to 
model predicates or sets. The meaning of a predicate p is the coreflexive relation [p] 
such that b\p\a = (b = a) A (p a). This is the relation that maps every a which 
satisfies p (and only such a) onto itself. The meaning of a set S C A is the meaning of 
its characteristic predicate [A a.a € 5], that is, 

6 [S'] a = (b = a) A a £ S (15) 

Wherever clear from the context, we will drop brackets [ ]. 

Before we embark on converting (3) into pointfree notation, let us see an alternative 
view of functions better suited for calculation. 


4 What is a function? — the “Galois view” 

Shunting rules. To say that f is a function is equivalent to stating any of the two Galois 
connections which follow: 


f-RCS = RCf°-S 
R-f°CS = RCS-f 


(16) 

(17) 
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As a warming-up exercise, let us check one of these, say (16). (The whole picture 
can be found in eg. [8,4, 3].) That / entire and simple implies equivalence (16) can be 
proved by mutual implication: 


f-RQS 

=>■ { monotonicity of composition } 

r-f-Rcr-s 

=> { / is entire (7) } 

RCf°-S 

=>■ { monotonicity of composition } 

f-Rcf-r-s 

= > { / is simple (6) } 

f-RCS 

That (16) implies that / is entire and simple can be checked by instantiating R, S := 
id, f (left-cancellation) or S, R := id, f° (right-cancellation), respectively. 

Function converses enjoy a number of properties of which the following is singled 
out because of its role in pointwise-pointfree conversion [2] : 

b(f° ■R-g)a=(fb)R(ga) (18) 

The use of (18) to convert (4, 5) into (6, 7), respectively, is left as an exercise. 

5 FD-satisfiability in pointfree style 

Attributes are functions. Let R be a n-ary relation with schema S', t be a tuple in R and 
a be an attribute in S. Notation t\u] was adopted in (3) to mean ‘the value of attribute a 
in t”. This indicates that a can be identified with the projection function which extracts 
the value which t exhibits as property a. Since this extends to a collection x of attributes, 
we can convert (3) into 

(V t , t' : t,t' £ R : ( x t ) = (x t') => (y t) = (; y t ') ) 

Assuming the universal quantification implicit, we reason: 

t £ R A t' £ R A (x t) = {x t') => (y t) = (y t') 

= { (18) twice, for R \= id } 

t£RAt'£RA t(x ° ■ x )t' => t(y ° ■ y )t' 

= { (15) twice } 

f[i?]u A t = u A t'lRju' A t 1 = u! A t(x ° ■ x )t' => t{y ° ■ y )t' 
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= { A is commutative; substitution of equals for equals; converse } 

tlRfu A u(x ° ■ x )u' A u'lR}°t' => t(y ° ■ y )t' 

= { composition ; relation inclusion (13) } 

[i2]-(x°-x)-[i2]° Cy°-y (19) 

= { shunting rules (16) and (17) } 

y ■ [i?] ■ X 0 ■ X ■ [R]° -y°Cid 

= { converse versus composition, ( R ■ S)° = S° ■ R ° , followed by (9) } 

img (y ■ [i?] ■ x °) C id 

In summary: a n-ary relation R as in definition 1 satisfies functional dependency x — > y 
iff the binary relation 

y ■ [-R] • x 0 (20) 

is simple, cf. (6). 


6 Functional dependencies in general 


Definition. Our own definition of FD starts from the observation that coreflexive rela- 
tion [i?] and projection functions x and y in (20) can be generalized to arbitrary binary 
relations and functions. This leads to the more general definition which follows. (The 
use of instead of >”is intentional: it stresses the move from the restricted to the 
generic notion.) 

R 

Definition 2. We say that relation B — A satisfies the “ f g” functional depen- 

R 

dency — written f — *■ g — iff g ■ R ■ f° in 



g-R-f 


is simple. Equivalent definitions are 

f ^g= R(kerf)-R° Ckerg (21) 

— cf. (19) and (8) — and 

f - 5 ee ker(f-R°) Ckerg (22) 

thanks to (10). 

Function f ( resp. g) will be mentioned as the left side or antecedent ( resp. right side 
R 

or consequent) ofFD f — >■ g. □ 
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Generic relational projection. The little piece of notation which follows will be of some 
help in the sequel: given a binary relation R and functions / and g such as in definition 
2, we define the /, g-projection of R as binary relation 

7t gJ R = gRf° (23) 

R 

So, fact f—^g in definition 2 can be rephrased by saying that projection n g jRis simple. 

It can be shown that definition (23) extends that of the standard n-ary relation 
project operator, whose set-theoretic semantics are as follows [12], for n-ary relation T 
with schema S and X C S: 

n x T = {x\t\ 1 1 € T} (24) 

In fact, while combining the lower adjoints of shunting rules (16, 17), n g j is itself a 
lower adjoint. 


7 t gJ R CS = RCg°-S-f (25) 

meaning that 7 t g jR is the smallest relation S such that, wherever a is /^-related to b , 
(g a) is S-related to (/ b ) — recall (18). Regarding R and 5 as sets of pairs, we have 

ttgjR = f {( 9 a,fb) | (a, b) e R} 

It can be easily shown that 7 tfjR is coreflexive wherever R is coreflexive. From this 
we draw, for n-ary relation T such as in (24): 

hxTj = n x ,xlT] 

(Note the use of the same symbol 7t to denote both the standard set-theoretic projection 
operator and the pointfree one.) 

Besides monotonicity and U-preservation, ensured by lower-adjointedness, binary 
relation projection obeys to a number of useful properties, namely: 


^id, id — ^ 

(26) 

'K f,g * 7T/i,fe = / Kf-h,k-g 

(27) 

(7 T fi9 R)°=7V 9 j(R°) 

(28) 

{ n f, 9 i n h,g ) — n {f,h),g 

(29) 

Another, quite interesting view of (25) is 

t tgjR C S = g(S <- R)f 

(30) 

where S R is Reynolds ‘Arrow combinator” 

g(S<-R)f = g-RCS-f 

(31) 


which is extensively studied in [2], So, a (/, (^-parametric) relation between two rela- 
tions (R and S) can be equated as a (R, S' -parametric) relation on the projection func- 
tions / and g themselves. 
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Examples. The reader may wish to check that / — 1 g holds for R any of the relations 
tabulated by the a and b columns of 


a 

b 

f b = b z 

g a = rem a 3 


a 

b 

f = id 

9 = TiT 

2 

-1 

1 

2 


(1,2) 

-1 

-1 

1 

5 

-1 

1 

2 


(1,10) 

-1 

-1 

1 

17 

1 

1 

2 

and 

(0,0) 

1 

1 

0 

10 

-2 

4 

1 


(5,6) 

-2 

-2 

5 

4 

-2 

4 

1 


(5,0) 

-2 

-2 

5 

1 

2 

4 

1 


(1,2) 

2 

2 

1 


Basic properties. In contrast with (3), equations (21) and (22) are easy to reason about, 
as the reader may check by proving the following, elementary properties, which hold 
for all R, /, g of appropriate type: 

/ A 9 (32) 

(where ± denotes the empty relation) 

/ " ! (33) 

I 

(where 1 1 — A denotes the unique, ‘teverywhere ’nothing’ "function of its type) 

R 

id^id = R is simple (34) 

f^f^RCid (35) 

An immediate consequence of (35) is 

/ M / (36) 


7 The role of injectivity 

Ordering relations by injectivity. It can be observed that what matters about / and 
g in (21) is their “degree of injectivity” as measured by ker f and kerg. in opposite 

R 

directions: more injective / and less injective g will strengthen a given FD / — *■ g. An 

R 

extreme case is f = id and q = ! — functional dependency id — k ! will always hold for 
any R, cf. (33). 

In order to measure injectivity in general we define the injectivity preorder on rela- 
tions as follows: 


R < S = ker S C ker R (37) 

that is, R < S means R is less injective than S 2 . Note that R and S must have the same 
source but don’t need to share the same target datatype. For instance, it is easy to see 

2 To be more precise, we should write “less injective or more deji ned ” since Jeer measures both 
properties, recall (14). In case of functions, / < g unambiguously means that / is less injective 
than g. 
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that 


! < R (38) 

R < ± (39) 

hold, since the kernel of function ! is the top relation and that of the empty relation is 

empty. 

The fact pre-composition respects the injectivity preorder, 

R< S => R-T < S -T (40) 


is easy to prove: 


R < S 

= { (37) and (8) } 

S° ■ S C R° ■ R 

=> { mono tonicity of (T° • ) and ( • T) } 

T° ■ S° ■ S ■ T C T° ■ R° ■ R ■ T 
= { (10) twice, followed by (8,37) } 

RT <ST 

(This proof instantiates a more general construction presented in appendix A.) 

FD defined via the injectivity ordering. The close relationship between FDs and injec- 
tivity of observations is well captured by the following re-statement of (22) in terms of 
(37): 


f K 9=9<f'R° (4D 

For its conciseness, this definition of FD is very amenable to calculation. Such is the 
case of the proof that two FDs with matching antecedent / consequent functions yield a 
composite FD, 


/ ^ h <= / A g A g A h (42) 

which follows: 

/ 9 A g^h 

= { (41) twice } 

9 < f ■ R° A h < g ■ S° 

=> { < -mono tonicity of ( • S°) (40) followed by (10) } 

9 - S° < f ■ (S ■ R)° A h < g ■ S° 
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=>■ { < -transitivity } 

h < f ■ (S ■ R)° 

= { (41) again } 

t s 'f u 

/ ^ h 

A category of functions. Note in passing that (42) and (36) suggest that we can build 

jR 

a category whose objects are functions /, g. etc. and whose arrows / g are 

R 

relations which satisfy / — *■ g. 

Simultaneous observations. In the same way x and y in (3) may involve more that one 
observable attribute, we would like / and g in (21) to involve more than one observation 
function. Multiple observations add more detail and so are likely to be more injective. 
The relational split combinator 

(a, b){R, S)c = a Rc A b S c (43) 

captures this effect, and facts 

R < (R, S) and 5 < (R, S) (44) 

are easy to check by recalling 

ker(R,S) = (ker R) PI (ker S) (45) 

which stems from 

{R, S)° ■ ( X , Y) = (R° ■ X) n (5° • Y) (46) 

cf. [4], Moreover, the following Galois connection 

(R,S) <T = R<T A 5 <T (47) 

stems from the one underlying n, as shown in appendix A. The anti-symmetric closure 
of < yields an equivalence relation 

R ~ S' = ker R = ker S (48) 

which is such that, for instance, ! ~ T holds. We also have 

S <R=(S,R) ~R (49) 


The following equivalences will be relevant in the sequel, for suitably typed R , S and 
T: 


R ~ {R, R) 
(R,S)~(S,R) 
{T,{R,S))~({T,R),S) 


( 50 ) 

( 51 ) 

( 52 ) 
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Function injectivity. Since attributes in the n-ary relational database model are (pro- 
jection) functions, we will be particularly interested in comparing functions for their 
injectivity. Note that the kernel of a function is an equivalence relation and thus always 
reflexive. So, restricted to functions, the < ordering is such that, for all /, 

\<f<id (53) 


and 


/ ~ id = / is an injection (54) 

From (53) and (40) we obtain / • R < R. From (6) we draw id < f° and thus 
S < f° ■ S. thanks to (40). 

More generally, Galois connection 


Rg<S = R<Sg° (55) 

holds — cf. proof in appendix A — which can be regarded as the ‘injectivity counter- 
part” of Thu nting”mles (16,17). 

FDs on functions. As special cases of relations, functions may also satisfy functional 

T 'll 

dependencies. For instance, it will be easy to show that bagify — * id holds, where 
bagify (resp. setify) is the function which extracts, from a finite list, the bag (resp. 
set) of all its elements. From (55) we draw: 

9 -h<f = f^g = g<f-h° (56) 


Thus the equivalence 


g<f = f^g (57) 

(let h = id) and a more general pattern of FD chaining 

/ ^ h y= fZg A g<j A j Aft (58) 

which extends (42) via (57). 

On equivalence . The discrimination of functions beyond ~-equivalence is unnec- 

essary in the context of FD reasoning. Since ordering and repetition in Splits” are 
~-irrelevant — recall (50), (51) and (52) — we will abbreviate {/, g) by fg , or by gf, 
wherever this notation shorthand is welcome and makes sense 3 . Such is the case of a 
fact which will prove particularly useful in the sequel: 

f^gh=f^ gA f^h (59) 

3 This is inspired by a similar shorthand popular in the standard notation of relational database 
theory: attribute set union, eg. X U Y, is denoted by simple juxtaposition, XY. 



First Steps in Pointfree Functional Dependency Theory 


13 


The proof of (59) is as follows: 
f*gh 

= { (41) ; expansion of shorthand gh } 

(a, h)<f-R° 

= { (47) } 

g<f-R° A h < f ■ R° 

= { (41) twice } 

f K 9 A f*h 

FD strengthening. The comment above about the contra- variant behaviour (concerning 
injectivity) of the antecedent and consequent functions of an FD is now made precise, 

h — 1 k <= h > f A f g A g > k (60) 

and justified: 

h> f A / - 3 - g A g > k 
= { (57) twice } 

A / g A g^k 
{ (42) twice; identity of composition } 

h^k 

The following are corollaries of (60), since fh > f: 

fh^g<=f^g (61) 

f*g^f K gh (62) 

By <=-transitivity, we see that it is always possible in a FD to move observations from 
the consequent (‘dependent’) side to the antecedent (‘independent’) one: 

fh*g^f*gh (63) 

Moving the “Very last” one also makes sense, since 

fhg^l^fh^g 

8 The Armstrong-axioms 

In this section we prove the correctness of the Armstrong-axioms [12], which are the 
standard inference rules for FDs underlying relational database theory. We show that 
FD theory is a natural consequence of the pointfree formalization presented earlier on. 
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In the standard formulation, these axioms involve sets of attributes of a relational 
schema S ordered by inclusion, eg. A' C Y C S. Unions of such attribute sets are 
written by juxtaposition, eg. XY instead of X U Y . Since attributes ATand Y “are” 
(projection) functions, XY will mean the split of such projections. In our setting, we 
generalize these to arbitrary functions ordered by injectivity. In fact, it is easy to see 
that AC Y implies A < Y. (For notation economy, we use the same symbols A and 
Y to denote both the attribute symbol and the associated projection function.) 

The whole schema S corresponds to a maximal observation. In our setting, this 
is captured by the identity function id , since — by product reflexion — the split of 
all projections in a finite product is the identity. (This observation will be made more 
precise in section 9.) 

As we have seen, n-ary relational database tables are sets of tuples which we model 
by coreflexive relations. For instance, a table with three attributes T C A x B x C 
will be modelled by coreflexive 

pi 

Ax B x C — A x BxC 


such that t [T]f' = t = t' A t € T . In this section, we will abbreviate [T] by T. 
Proofs of the Armstrong-axioms follow: 

- FI. Reflexivity : 

x x (64) 

— recall (35). Another way to put it is 

T 

y < x x —*■ y (65) 

T 

which follows from x — L x A x > y, recall (60). Another way to express (65) is 

yz^y (66) 

— let x := yz. 

- F2. Augmentation : 

T T 

x — ^ y => xz — *■ yz (67) 

Proof: 

T 

xz — 1 yz 

= { ( 59 ) } 

T . T 

xz — 1 y A xz — 1 z 

= { reflexivity (FI) in version (66) } 

T 

xz — ^ y 

<= { ( 61 ) } 

T 

x^y 
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We observe that Maier’s version of this axiom is the implication step just above 

[ 12 ]. 

- F3. Additivity (or Union): 


T T T 

x —*■ y A x — ^ z =r' x — ^ yz 


This is one of the ‘[ting -pong” sides of (59). 

- F4. Projectivity: 


T T . T 

x — 1 yz => x — *■ y A x — ^ z 


This is the other side of (59). 

- F5. Transitivity : 


T T T 

x — -y A y — 1 2 => x —>■ z 


(68) 


(69) 


(70) 


This stems from (42) for S and R the same coreflexive T, in which case T ■ T = T. 

- F6. Pseudo-transitivity : 


T T T 

x— -y A wy — 1 2 =>• xw —>■ z (71) 


cf. 


T . T 

x — - y A wy — *■ z 
=>■ { augmentation (F2) } 

t . T 
xw —*■ yw A try — ^ z 

=>■ { transitivity (F5) } 

T 

xw — - 0 

This completes the six inference axioms which are presented and proved in [12] 
either directly — using (3) — or indirectly, using tuple counting and properties of two 
standard n-ary relation operators: select and project. Our proofs are substantially sim- 
pler thanks to the economy of (41) and derived results. 

To complete the set, we present below two consequences of the standard axioms 
which are adopted for efficiency in FD reasoning: 


- Decomposition : 

This is (60) for f = k. Alternatively, 


T A T 

x — ^ y A ^ < y => x — *■ ^ 


(72) 


x — - y A z <y 

{ (FD } 

T A T 

x y A y — *■ z 

{ (F5) } 


X 
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- Accumulation : 

T T T 

x — ^ yz A z —*■ wv =>■ x —*■ yzt; (73) 

In fact: 

T T 

x — yz A z —^wv 

=> { (F2) } 

T T 

x — ^ yz A yz — *■ ywv 

=* { (F5) } 

T . T 

x — 1 yz A x — 5 - ywv 

= { ( 59 ) } 

T 

x — *■ yzwv 

=> { ( 62 ) } 

T 

X — 1 


9 Keys and attributes 

jR 

/ujv.v. Every x such that x id — if it exists — is called a superkey for R. Keys are 
minimal superkeys, that is, they are functions x as above such that, for all y < x such 

n 

that y ^ x, y id. In other words, 

x is a key of R = x^id A (V y : y id : y ~ x V y ^ x) 

From (34) and (53) we draw that id is always a (maximal) superkey for simple relations. 


Attributes. Database (relational) files are coreflexives on n-dimensional Cartesian prod- 
ucts A\ x • • • x A n . Each projection tt, (i £ n) is called an attribute. From x -reflexion 
(tt\ , . . . , 7 r„) = id we draw that all attributes together are maximal superkeys: 7ti ■ ■ ■ 7t„ ~ 
id. In fact, any permutation of this split is an isomorphism (eg. swap = for 

n — 2) and therefore a maximal superkey. Wherever / is an arbitrary split of attributes 
we denote by / the split of the remaining attributes, in any order. The / notation only 
makes sense in the context of ~-equivalence and obeys the following properties: 

fl - id 

7 -/ 
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10 Lossless decomposition 

Arbitrary FDs are, in general, hard to maintain because they constrain the update, insert 
and delete operations on database files, and waste space. Therefore, instead of allowing 
some relation T to satisfy an arbitrary FD, it is preferable to ‘fextract” such a depen- 
dency by decomposing T in two parts — the FD itself, eg. with schema 

Si = {Flight, Departs} 

— recall FD (2) in our introductory example — and the Test’' of T, with schema 

S 2 = {Pilot, Flight, Date} 

Such components are referred to, in the standard terminology, as projections of T and 
are denoted by ns 1 T and 7ts 2 T, respectively. (Read nsT as the projection ofT along 
schema S.) 

In this example, the fact that Flight — the antecedent of the selected FD — is 
kept in schema S 2 has to do with the principle of lossless decomposition: once T is 
decomposed in projections 7Ts, T and 7t s 2 T, by ‘joining” them one should be able to 
recover the original relation 4 : 


(n Sl T) m (i ts 2 T) = T 

Lossless decomposition is a representation technique which is central to relational 
database implementation. Of course, not every pair of projections is lossless. A kernel 
topic of this theory of database design by decomposition is precisely that of finding con- 
ditions for safe decomposition. Such is the case of extracting functional dependencies, 
such as illustrated above, thanks to a couple of theorems which will be dealt with in the 
sequel. 

The first of these — exercise 6.4 in [12] — is as follows: given relation schemes Y 
and Z such that Y PI Z = X and a relation T with schema YZ satisfying FD X -A Y, 
then lossless decomposition 


T = (nyT) m (ttzT) 


holds. 

Our proof of this result boils down to almost no-work-at-all thanks to the follow- 
ing binary relation extension of the projection operator given by (23). Recall that (23) 
expresses the standard semantics of relational projection, the only difference being that 
(23) requires two projection functions — antecedent / and consequent g — instead 

4 The standard, set-theoretic semantics of the n-ary relation join operator is as follows [12], for 
relations T, T' with schemes S, S' . respectively: 


T m T' = {t | <3 t, t’ : teTAf'eT': t = S[t] At' — S"[f])} 
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of one. This pair leads to a straightforward definition of join : joining two projections 
which share the same antecedent function, say x, is nothing but binary relation split: 

( 7 r y , x R) h ( 7 t ZjX R) d = (y ■ R ■ x° , z ■ R ■ x°) 

And lossless decomposition can be expressed parametrically with respect to consequent 
functions y and z. 


{^y,xR) ^ (tt z,xR) — ^yz,xR 


that is. 


{y ■ R - x°,z ■ R - x°) = {y,z) ■ R - x° 

It is well-known that such unconditioned x -fusion doesn’t hold in relation algebra, 
in general. A theorem in [1] (Theorem 12.30) includes the following side-condition for 
such a fusion to take place, where R, S, T are suitably typed binary relations: 

(R,S)-T = ( R-T,S-T ) <= R ■ (img T) CRvS-(imgT) C S (74) 

For instance, fusion takes place wherever T is simple 5 or wherever S (or R) is a func- 
tion and T is its converse, eg. 

(R,f)-f o = (R-r,f-n (75) 

In our case, from (74) we draw (R, S := y, z) 

(y,z) ■ T = (y • T,z ■ T) <= y < T° V z < T° 

— recall (11) and (37) — and, further instantiating T := R ■ x° , we obtain 

(y,z) ■ (R ■ x°) = (y ■ R ■ x° ,z ■ R ■ x°) <= x^yMx^z 

In summary, we can establish lossless decomposition via FD extraction as follows, 
back to the project/join notation: 

R R 

{n y ,xR) X (tt z ,xR) = Kyz,xR <= x-+y\Jx^z (76) 

The question arises: are there side-conditions weaker than that of (76) for lossless 
decomposition to take place? It turns out that FD existence is a sufficient but not nec- 
essary condition for safe decomposition: the more general concept of a multi-valued 
dependency, addressed in the sequel, is what is actually required. 


5 Cf. also [4], 
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11 Multi-valued dependencies 

Definition 3. Given subsets x,y C S of the relation scheme S of an n-ary relation T, 
this relation is said to satisfy the multi-valued dependency (MVD) x y iff, for any 
tw’o tuples t,t' £ T which “agree” on x there exists a tuple t" £ T which “agrees” 
with t on xy and “agrees ” with t' on S — xy, that is, 

{V t,t' : t,t' £T : t[x\=t'[x\ 

a- 

(3 t" : t" e T : t[xy] = t"[xy } A 
t"[S-xy] =t'[S 

cf the following picture: 



X 

y 

S -xy 

t 

a 

P 

7 

t" 

a 

P 


f 

a 

P 1 



□ 

Our efforts towards writing (77) without variables will be considerably softened by 
the rules which follow, one generalizing relational inclusion and the other relational 
composition: 


) (77) 

) 


R,S 7p <fi 

- Given two binary relations B -* A and two predicates 2 - A and 2 ^ — 

(coreflexively denoted by 'I J and <I>, respectively), then 

(V b, a : (<j> b) A (ip a) : bRa^bSa) = $-R-$°CS (78) 

extends (13), which corresponds to the special case $ — tp = id . (In retrospect, 
notice this is the rule implicit in the reasoning carried out in section 5.) 

R s 4* 

- Given two binary relations B — — A and A C and predicate 2 -< A 

we have, for all b, c 

(3 a : fa : b R a A a Sc) = b(R ■ # • S)c (79) 

extends relational composition (for = id we are back to R ■ S). 

In the spirit of the / notation of section 9, we denote S' — xy by xy in the following 
conversion of the existential quantification of (77) into pointfree notation: 

(3 t " : t" (E T : t [xy] = t"[xy\ A t"[xy] = t’[xy ]) 

= { (79) for f := (e T), an so on } 

tfkerxy ■ [T] ■ kerxy)t! 
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Then we include this in the overall formula and reason: 

(V 1. t' : t.t' £ T : t[x] = t'[x ] =>■ t(kerxy ■ [T] • kerxy)t ') 

:: { mle (78) for <f> = ip = (s T) } 

[T] • ( kerx ) • [T]° C ker xy ■ [T] • kerxy 
= { kernel of composition } 

ker ( x ■ [T]°) C kerxy ■ [T] • kerxy 

Thus we reach the following pointfree definition, in which we generalize [T] to an 

R ^ 

arbitrary endo-relation A A and introduce notation x — ^ y (read: x multi- 

R 

determines y in R) in the spirit of x — y earlier on: 
ft Hpf 

x — ^ y = ker (, x ■ R°) C (kerxy) ■ R ■ kerxy (80) 

Why does definition (80) require R to have the same source and target type? Just expand 
the right hand-side of (80) and ‘^hunt” wherever possible, 

(xy ■ R ■ x°) ■ (x ■ R° ■ xy ° ) C xy ■ R ■ xy° (81) 

to obtain diagram 



R 

Therefore, MVD x — | - A y requires R to be an endo-relation. This diagram provides 

R 

an alternative meaning for MVDs: x — | - A y holds iff projection Tt xy ^yR ‘factorizes” 
through x. A trivial example of such R is _!_, which satisfies any MVD. In case of 
transitive R — ie. such that R. ■ R. C R holds — it is easy to see that condition 

(ker x) ■ R° C R 

is sufficient for (81) to hold, since (xy xy) is monotonic. Thus T satisfies any 

MVD. 
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As it happens with FDs, the axiomatic theory of MVDs assumes R to be ‘h set of 
tuples”. As above, we model such a set by a coreflexive relation as use capital letter T 
to stress this assumption. 

MVDs are more general and less intuitive than FDs. It is known from the standard 
theory that FDs are just a particular case of MVDs, that is, 

x^y => x^y (82) 

holds. Our proof of this fact (often termed the conversion axiom ) is as follows: 

T 

x^y 

=> { augmentation (67) for z := x } 

T 

x —*■ xy 

= { FD definition (22) } 

ker (x ■ T°) C kerxy 

=>• { composition is monotone, T = T° = T ■ T for coreflexive T } 

ker (x ■ T°) C kerxy ■ T 

=> { in general, / < id, thus T CT ■ ker / } 

ker (x ■ T°) C (kerxy) ■ T ■ kerxy 
= { definition (80) } 

T 

x^y 

Lossless decomposition. The conversion axiom is given in [12] as a corollary of the 
theorem of lossless decomposition of MVDs. This theorem (number 7.1 in [12]) states 

T _ 

that fact x y holds if and only if T decomposes losslessly into two relations with 
schemata xy and xyx, respectively: 

T 

X — ^ y = (tT y jX T) M (tt yx.xT) — 'Kyyx,x'I' (83) 

A pointwise proof of this result is given in [12] in ‘Implication-firsfTogic style, in two 
parts — the if side followed by the only if side of the equivalence. Being performed 
as they are directly over (77), these proofs aren’t easy to follow with their existential 
and universal quantifications over no less than six tuple variables By 

contrast, our proof is a sequence of pointfree equivalences: 

^ ('KyX.xT) — 7T yyx.xT 

= { (23) } 

(y ■ T ■ x° ,yx ■ T ■ x°) = yyx ■ T ■ x° 
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. { since { R , S) ■ T C (R ■ T, S ■ T) holds by monotonicity } 

(y - T ■ x°,yx T ■ x°) C yyx ■ T ■ x° 

= { ‘Split twist” rule (97) ; converses } 

(y ■ T ■ x° ,id) ■ (x ■ T° ■ yx°) C {y,x ■ T°) ■ yx° 

= { X = X ■ X° ■ X } 

{y ■ T ■ x°,id) ■ x ■ x° ■ x ■ T° ■ yx° C (y,x ■ T°) ■ yx° 

= { (84) below twice, since hot x ■ x° and T° are coreflexive } 

{y ■ T ■ x°, x ■ x°) ■ x ■ T° ■ yx° C { y , x) ■ T° ■ yx° 

= { (75) in which / = x followed by (84) in which # = T } 

({y, x) ■ T ■ x°) ■ (x ■ T° ■ yx°) C (y, x) ■ T ■ yx° 

= { ( 81 ) } 

T 

x^y 

Two steps in the calculation above rely on fact 

(R, S) = (R,S ■$) (84) 

which is easy to justify: 

{R, S) ■ # = (R, S ■ $} 

= { split pointfree definition } 

(tt? ■ R n 7T 2 ° • 5) • # = 7Ti° ■ R n n° 2 ■ S ■ $ 

= { converses and commutativity } 

0 ■ (5° • tt 2 n R° ■ 71-1 ) = (0 ■ S° ■ 1 r 2 ) n {R° ■ i o) 

<= { fact (V S, T : : R ■ S n T = R ■ (S fl T)) = R C id in Ex. 1 1.22 of [3] } 

# C id 

Further MVD reasoning. MVD theory generalizes FD theory. Some results stem di- 
rectly from the conversion axiom, as is the case of the MVD reflexivity axiom, 

T 

y < x => x — - A y (85) 

since 

T 

x^y 

<t= { conversion (82) } 
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T 

x^y 

<t= { FD reflexivity (64) } 

V < x 

Some others are new, for instance the complementation axiom: 

R R _ 

x y =>■ x — ^ y (86) 

which the reader may wish to prove as an exercise. All other MVD inference axioms 
can be found in [12], section 7.4.1. In this paper we don’t go beyond this point. 

12 Conclusions 

This paper presents a pointfree version of functional dependency theory, the kernel of 
relational database design “"a la Codd”. Contrary to the intuition that a binary relation is 
just a particular case of n-ary relation, this paper shows the effectiveness of the former 
in “explaining” and reasoning about the latter. 

It turns out that the theory becomes more general and considerably simpler. The 
adoption of the (pointfree) binary relation calculus is beneficial in several respects. First, 
the fact that pointfree notation abstracts from ‘joints” or variables makes the reasoning 
more compact and effective. Elegant formulae such as (41) — when compared with 
(3) — come in support of this claim. Second, proofs are performed by easy-to-follow 
calculations. Third, one is able to generalize the original theory, as happens with our 
generalization of attributes to arbitrary (suitably typed) functions in FDs and MVDs. 

In retrospect, the use of coreflexive relations to model sets of tuples and predicates in 
the binary relation calculus (instead of arbitrarily partitioning attributes in ‘Source ones” 
and ‘target ones’) is perhaps the main ingredient of the simplification and subsequent 
generalization. (A similar strategy has been followed in [14] concerning a pointfree 
model of hash tables). 

13 Future work 

While addressing the foundations of FD theory in a pointfree style, no claim is made 
in this paper for extending or improving the standard theory. What is gained is a better 
starting point for relational database theory [12], a fairly large (and often convoluted) 
body of knowledge 6 . 

The effectiveness of the approach can only be tested once more and more results 
are dealt with. In this paper, multivalued dependencies have only been hinted at. Join 
dependencies and difunctional dependencies [9] have not been considered at all. The use 
of functional dependencies in solving ambiguities in multiple parameter type classes 
in the Haskell type system [10] may happen to be another area of application of the 
reasoning techniques developed in this paper. 

6 FDs account for no more than 20% of pages in Maier’s book [12]. 
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At the level of the pointfree ‘transform” itself, our notion of kernel of a binary rela- 
tion is a conservative one when compared to that of [7], which can be considered as an 
alternative (both coincide on functions). Moreover, left and right conditions [8] should 
be also exploited as alternatives to coreflexives in n-ary relation modelling. Finally, 
the connection between binary relation projection and Reynolds ‘relation on functions” 
expressed by (30) is worth studying in more detail, taking into consideration the corre- 
sponding point-free theory developed in [2] . 

Concerning representation theory and lossless decomposition, some recent results 
in [13] and [16] should be taken into account and generalized. 
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A A little preorder construction 


The concept of a preorder — ie. that of a reflexive and transitive endo-relation — is 
central to the mathematics of computing. It paves the way to Galois connections and 
other interesting topics (eg. lexicographic orders, etc.). In this annex we concentrate 
on a particular preorder construction which is used extensively in this paper. For more 
about preorders see eg. [1] and [4]. 

The construction. Let A — A. be a preorder. Given a function A — — - — B , the 
relation B B defined by 


A = h° ■ C ■ h (87) 

is also a preorder: it is reflexive, 

id C A 

= { (87) and shunting (16) } 

ZiCD/i 

4= { (-h) is mono tonic } 

idCC 

= { C is reflexive } 


True 
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and transitive: 


A ■ r< 

= { (87) twice; associativity of composition } 

h° ■ C ■ (h ■ h°) ■ C ■ h 
C { h is simple (6) } 
h° ■ C ■ C ■ h 
C { C is transitive } 
h° • C • h 
{ (87) } 

A 

Example. The injectivity preorder defined earlier on (37) is an example of this construc- 
tion for h = ker , ^=<° and C=C: 


<° = ker° • C -ker 


( 88 ) 


that is. 


< = ker° ■ C° • ker 


(Note the extra converse operator.) 

Preorder homomorphism. By construction, (87) establishes h as a preorder homomor- 
phism — cf. 


a ^ a' = h a C h a 1 

in pointwise notation — which can be exploited to ‘lift” results from the C to the ■< 
order. We present two such results, one concerning monotonicity and the other Galois 
connections. For space economy, both will be presented restricted to endo-functions. 
(The general formulation is similar.) 

k 

Lifting monotone operators. Let A ^ — — A be a C-monotonic endo-function, and k 
be its /i-counterpart, that is, 

h ■ k' = k ■ h (89) 


Then k' is -< -monotonic: 


C k'°- A -k' 


{ (87) twice ; (10) } 


( 90 ) 
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= { (89) twice ; (10) } 

h° ■ C • h C h° ■ k° ■ C • k ■ h 
<= { ( h° ■ _ • h) is monotonic } 

C C k° • C • k 

= { since k is C-monotonic } 

True 


Examples. From 


ker (R-T) =T° ■ (ker R) ■ T (91) 

we identify, for h = ker , k 1 = ( ■ T) and k = (T° ■ _ • T). Since k is C-monotonic, 
from (90) we draw that k' is <°-monotonic, which is equivalent to being <-monotonic. 
This justifies equation (40) in the main body of the paper. A similar argument can be 
provided to justify < -mono tonicity of any relator F, 

R<S ^FR<FS 

for k = k 1 = F, since F is C-monotonic and 

ker (F R) = F (ker R) 


holds. 

k.,j 

Lifting Galois connections. Suppose that functions A -<■ A are Galois connected 

via preorder C and that k 1 , j' are the /t-cou nterparts of lower-adjoint A; and upper-adjoint 
j, respectively. That is, facts 


fc°-C = C-j 

h-k' = k-h 
h ■ j' = j ■ h 


hold. Then k',j' are A-Galois connected. 


k'° = 


(92) 

(93) 

(94) 

(95) 


k'°-< 

{ (87) } 

k' a ■ h° ■ C ■ h 


as proved below: 
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= { (93) and converses } 

h°-k°-Q-h 

= { (92) followed by (94) } 

h° ■ C ■ h ■ j' 

{ (87) } 


Examples. Consider the following instances of T in (91) and corresponding instances 
of k, k' ,j, j 1 , for some function g of appropriate type: 


T:=g 
T:= g° 


f j' = (-9 ) 

l j = {g° 9) 

/*' = ( • 9 °) 
\k = (g---g°) 


The fact that k and j are Galois connected stems from the composition of shunting rules 
(16,17): 


g ' X ■ g° C.Y = X C. g° -Y ■ g 

Then, from (95) we draw 

k'° ■ <° = <° • j' 

which, taking converses, is the same as 

j'° ■< = <■ k' 


that is. 


( ■$)■< = <■( - 5 °) 


— ie. (55) — holds. 

A similar argument will justify Galois connection (47), stemming from relational 
split being ker-homomorphic to relational meet (45), which is the upper-adjoint in its 
defining Galois connection: 


TCRdSeTCUATCS (96) 

Because of the extra converse in <° in (88), the fact that meet is the upper-adjoint wrt. 
C casts split as the lower-adjoint wrt. <: 
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A more explicit argument is as follows: 

(R, S) <T 

= { (37) and (45) } 

kerT C (ker R) PI (ker S) 

= { (96) } 

kerT C ker R A kerT C ker S 
= { (37) twice } 

R<T A S <T 

B The “split twist” rule 

A step in the proof of lossless decomposition (83) is based on the following equivalence, 
(R, S) C (U, V)-X = (R, id) ■ S° C (U, X°) ■ V° (97) 

itself a consequence of 

(R, S)-T C (U, V)-X = (R, T°) ■ S° C (U, X°) ■ V° (98) 

for T := id. In order to prove (98), we reason using points x. y and z\ 

( V,z ) (R,S) T x 
= { composition and split } 

(3 u :: y Ru A z S u A uT x) 

= { converses } 

(3 m :: y Ru A xT° u A u S° z) 

= { split and composition } 

(y,x) (R, T°) ■ S° z 

Similarly, 

(y,z) (U,V)-Xx = (y,x) (U,X°)-V°z 


and so on. 



